-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental backups / partial restore #2963
Conversation
added support for readTs for partial restore in Load func receiver. simplified Load function a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 10 files at r1.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @srfrog)
dgraph/cmd/alpha/admin_backup.go, line 54 at r1 (raw file):
since, err = strconv.ParseUint(at, 10, 64) if err != nil { x.SetStatus(w,
Does this actually check that since is greater than zero? 0 is still a valid uint so it will be converted just fine
ee/backup/file_handler.go, line 98 at r1 (raw file):
if err != nil { if glog.V(2) { fmt.Printf("--- Skip: invalid backup name format: %q\n", file)
Why the "---"? Do any existing log lines have them? Otherwise it might just be better to remove them.
ee/backup/handler.go, line 33 at r1 (raw file):
Create(*url.URL, *Request) error // Load will scan location URI for backup files, then load them with loadFunc. Load(*url.URL, uint64, loadFn) error
is it possible to give names to the arguments in these functions? I think it would be a bit more readable.
worker/backup_ee.go, line 95 at r1 (raw file):
// BackupOverNetwork handles a request coming from an HTTP client. func BackupOverNetwork(ctx context.Context, destination string, since uint64) error { ctx, cancel := context.WithCancel(ctx)
is it fine to overwrite ctx here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @martinmr)
dgraph/cmd/alpha/admin_backup.go, line 54 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
Does this actually check that since is greater than zero? 0 is still a valid uint so it will be converted just fine
If we get an error here it means the value was not a proper uint (e.g., at=horse
or at=-10
) . If at= is 0 then none of the logic is trigged so it wont matter. And I don't check values attached to errors, ever.
ee/backup/file_handler.go, line 98 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
Why the "---"? Do any existing log lines have them? Otherwise it might just be better to remove them.
Logs can get noisy, the dashes mean "while doing what I said above just now, this happened...". Since it's a printf to stdout, there are no visual artifacts to follow the trail. Imagine 100 files, 1000, etc.
ee/backup/handler.go, line 33 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
is it possible to give names to the arguments in these functions? I think it would be a bit more readable.
This is an interface not a function definition. The Go standard/idiomatic-way is not name the abstract types. The actual function definition (concrete type) will have the arg names and how it uses them. But I will expand the comments which is really what explains how these might work.
worker/backup_ee.go, line 95 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
is it fine to overwrite ctx here?
Yes, a new context is created using the incoming as parent. And the parent is a background context which is a static reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @martinmr)
ee/backup/file_handler.go, line 98 at r1 (raw file):
Previously, srfrog (Gus) wrote…
Logs can get noisy, the dashes mean "while doing what I said above just now, this happened...". Since it's a printf to stdout, there are no visual artifacts to follow the trail. Imagine 100 files, 1000, etc.
Why not use glog here? It's only printed when glog.V(2), but it doesn't log with glog?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @martinmr)
ee/backup/file_handler.go, line 98 at r1 (raw file):
Previously, danielmai (Daniel Mai) wrote…
Why not use glog here? It's only printed when glog.V(2), but it doesn't log with glog?
restore only works in CLI. glog looks ugly because it has timestamp when the rest doesn't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 9 files at r7, 2 of 7 files at r8, 5 of 5 files at r11.
Reviewable status: 14 of 15 files reviewed, 17 unresolved discussions (waiting on @gitlw, @manishrjain, @martinmr, and @srfrog)
ee/backup/backup_test.go, line 52 at r11 (raw file):
conn, err := grpc.Dial("localhost:9180", grpc.WithInsecure()) x.Check(err) dg := dgo.NewDgraphClient(api.NewDgraphClient(conn))
FYI, the z.DgraphClientNoDropAll() does exactly this.
ee/backup/backup_test.go, line 86 at r11 (raw file):
} for i := range dirs { x.Check(os.RemoveAll(dirs[i]))
You can't just remove the whole data directory instead of subdirs individually? Are there additional files there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 14 of 15 files reviewed, 17 unresolved discussions (waiting on @codexnull, @gitlw, @manishrjain, and @martinmr)
ee/backup/backup_test.go, line 86 at r11 (raw file):
Previously, codexnull (Javier Alvarado) wrote…
You can't just remove the whole data directory instead of subdirs individually? Are there additional files there?
I thought so too, but docker compose had issues starting without the dir being there. Maybe I'm missing something. So I left data/
with a gitkeep.
…ue-2949_incremental_backups
The max version Ts is used to verify a successful restore and optionally update startTs on zero.
…ue-2949_incremental_backups
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 9 files at r7, 10 of 10 files at r12.
Reviewable status: all files reviewed, 21 unresolved discussions (waiting on @gitlw, @manishrjain, @martinmr, and @srfrog)
ee/backup/run.go, line 119 at r12 (raw file):
grpc.WithBlock(), grpc.WithInsecure(), grpc.WithTimeout(10*time.Second))
grpc.WithTimeout() deprecated according to the docs, so it's probably best to not use it in new code. Can you use context.WithTimeout() here like you do a few lines below?
ee/backup/s3_handler.go, line 222 at r12 (raw file):
} const N = 100
How about a better name for this variable? BATCH_SIZE, MAX_PENDING, ...
ee/backup/s3_handler.go, line 235 at r12 (raw file):
}(i, o) if i%N == 0 { wg.Wait()
Won't this wait after the first object (i == 0, and 0 % anything == 0)? I think you want (i+1)%N instead.
ee/backup/s3_handler.go, line 237 at r12 (raw file):
wg.Wait() } }
Does this loop have a potential deadlock or do I not understand how channels work?
Suppose there are 101 objects. The loop will start all goroutines (i = 0 through 100) and then wait for them to finish. But the rc channel has a buffer length of 1, so 100 of them will block until the rc channel is read, which won't happen because the parent routine is in wg.Wait().
Should the make(chan) have a buffer length of N?
ee/backup/s3_handler.go, line 258 at r12 (raw file):
defer close(doneCh) suffix := "/" + backupManifest
Can you use filepath.Join() here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 21 unresolved discussions (waiting on @codexnull, @gitlw, @manishrjain, and @martinmr)
ee/backup/run.go, line 119 at r12 (raw file):
Previously, codexnull (Javier Alvarado) wrote…
grpc.WithTimeout() deprecated according to the docs, so it's probably best to not use it in new code. Can you use context.WithTimeout() here like you do a few lines below?
Done.
ee/backup/s3_handler.go, line 123 at r8 (raw file):
Previously, srfrog (Gus) wrote…
actually, I dont know why we are using
strings.Compare()
anyway. it's gone.
Done.
ee/backup/s3_handler.go, line 222 at r12 (raw file):
Previously, codexnull (Javier Alvarado) wrote…
How about a better name for this variable? BATCH_SIZE, MAX_PENDING, ...
Done.
ee/backup/s3_handler.go, line 235 at r12 (raw file):
Previously, codexnull (Javier Alvarado) wrote…
Won't this wait after the first object (i == 0, and 0 % anything == 0)? I think you want (i+1)%N instead.
nice catch, fixed
ee/backup/s3_handler.go, line 237 at r12 (raw file):
Previously, codexnull (Javier Alvarado) wrote…
Does this loop have a potential deadlock or do I not understand how channels work?
Suppose there are 101 objects. The loop will start all goroutines (i = 0 through 100) and then wait for them to finish. But the rc channel has a buffer length of 1, so 100 of them will block until the rc channel is read, which won't happen because the parent routine is in wg.Wait().
Should the make(chan) have a buffer length of N?
Yes you are correct, I had moved wg.Wait() outside, that needed to be a time.Sleep() and a larger buffer like you suggested.
ee/backup/s3_handler.go, line 258 at r12 (raw file):
Previously, codexnull (Javier Alvarado) wrote…
Can you use filepath.Join() here as well?
I could but this is an URI path which only uses /
. I use filepath.Join()
in the file handler in case it is an NTFS mount or Dgraph is used on Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good for now. I'll work a bit on backups once you merge.
Reviewable status: 15 of 16 files reviewed, 15 unresolved discussions (waiting on @codexnull, @gitlw, @manishrjain, and @martinmr)
ee/backup/run.go, line 119 at r12 (raw file):
grpc.WithTimeout() deprecated according to the docs, so it's probably best to not use it in new code. Can you use context.WithTimeout() here like you do a f
This comment hasn't been addressed, but marked as done.
This PR adds incremental backups by reading manifest files that store the version of previous backups. Only an URI location is needed for the system to deduce what needs to happen. - First backup is a full backup - Any existing backup without a `manifest.json` file is ignored - The version of the most recent backup is used for determining the version to start the current process. - The process is path-driven, so it's simple to start with fresh backups.
This PR adds incremental backups by reading manifest files that store the version of previous backups. Only an URI location is needed for the system to deduce what needs to happen.
manifest.json
file is ignoredThis change is